Mosquitoes or Snakes — which scares you more?

Countries with Risk of Dengue

(Based on 2019 data)

Why are countries near the equator at higher risk of dengue?”

Favorable climate that speed up the mosquito life cycle

  1. Warm temperatures year-round

  2. High humidity and frequent rainfall

  3. No harsh winters

  4. Many equatorial countries have rapidly growing urban areas with poor waste and water management.

  5. People often live and work in open environments with limited protection, increasing exposure.

Countries with Risk of Dengue

Sri Lanka

Time Series Plot of Weekly Dengue Cases in Sri Lanka (National Level)

Time Series Plot of Weekly Dengue Cases in Sri Lanka (National Level)

Red Segment: Interrupted Period Due to the COVID-19 Pandemic

Data

Weekly Dengue Cases Corresponds to 25 Districts in Sri Lanka

Data Source

denguedatahub R package

On CRAN

install.packages("denguedatahub")

You could install the development version from Github using

install.packages("devtools")
devtools::install_github("thiyangt/denguedatahub")

Web scrapping using the denguedatahub R package

More about denguedatahub

link: https://denguedatahub.netlify.app/

Methodology: Methods of Forecasting

Methods of Addressing Interrupted Period

Approach 1

Excluding the interrupted period from model training

Approach 2

Forecasting the interrupted period first and then using it for modeling

Approach 3

Down-weighting interrupted period

Benchmark

  • Without considering the interruption effect

Analysis

R Pckages

# install.packages("devtools")
# devtools::install_github("thiyangt/denguedatahub")
library(denguedatahub)
library(tidyverse)
library(tsibble)
library(fable)
library(fabletools)
library(lubridate)
library(broom)

Data

Show the code
data("srilanka_weekly_data")
df <- srilanka_weekly_data |>
mutate(yearweek = yearweek(start.date))
duplicaterws <- df |> 
  mutate(
         yearweek = yearweek(start.date)) |> 
  duplicates(key = district, index = yearweek)
df_tsibble <- df |>
  distinct(district, yearweek, .keep_all = TRUE) |>
  as_tsibble(index = yearweek, key = district)
df_tsibble |> head() 
# A tsibble: 6 x 7 [1W]
# Key:       district [1]
   year  week start.date end.date   district cases yearweek
  <dbl> <dbl> <chr>      <chr>      <chr>    <dbl>   <week>
1  2006    52 12/23/2006 12/29/2006 Ampara       0 2006 W51
2  2007     1 12/30/2006 1/5/2007   Ampara       0 2006 W52
3  2007     2 1/6/2007   1/12/2007  Ampara       0 2007 W01
4  2007     3 1/13/2007  1/19/2007  Ampara       0 2007 W02
5  2007     4 1/20/2007  1/26/2007  Ampara       0 2007 W03
6  2007     5 1/27/2007  2/2/2007   Ampara       0 2007 W04
df_tsibble |> tail() 
# A tsibble: 6 x 7 [1W]
# Key:       district [1]
   year  week start.date end.date  district cases yearweek
  <dbl> <dbl> <chr>      <chr>     <chr>    <dbl>   <week>
1  2025     6 1/25/2025  1/31/2025 Vavuniya     3 2025 W04
2  2025     7 2/1/2025   2/7/2025  Vavuniya     2 2025 W05
3  2025     8 2/8/2025   2/14/2025 Vavuniya     2 2025 W06
4  2025     9 2/15/2025  2/21/2025 Vavuniya     0 2025 W07
5  2025    10 2/22/2025  2/28/2025 Vavuniya     5 2025 W08
6  2025    11 3/1/2025   3/7/2025  Vavuniya     0 2025 W09

Training vs Test Datasets

Training set

train_tsibble <- df_tsibble |> 
  filter(year(yearweek) < 2025)
train_tsibble 
# A tsibble: 24,466 x 7 [1W]
# Key:       district [26]
    year  week start.date end.date   district cases yearweek
   <dbl> <dbl> <chr>      <chr>      <chr>    <dbl>   <week>
 1  2006    52 12/23/2006 12/29/2006 Ampara       0 2006 W51
 2  2007     1 12/30/2006 1/5/2007   Ampara       0 2006 W52
 3  2007     2 1/6/2007   1/12/2007  Ampara       0 2007 W01
 4  2007     3 1/13/2007  1/19/2007  Ampara       0 2007 W02
 5  2007     4 1/20/2007  1/26/2007  Ampara       0 2007 W03
 6  2007     5 1/27/2007  2/2/2007   Ampara       0 2007 W04
 7  2007     6 2/3/2007   2/9/2007   Ampara       0 2007 W05
 8  2007     7 2/10/2007  2/16/2007  Ampara       0 2007 W06
 9  2007     8 2/17/2007  2/23/2007  Ampara       0 2007 W07
10  2007     9 2/24/2007  3/2/2007   Ampara       1 2007 W08
# ℹ 24,456 more rows

Test set

test_tsibble  <- df_tsibble |> 
  filter(year(yearweek) == 2025)
test_tsibble
# A tsibble: 234 x 7 [1W]
# Key:       district [26]
    year  week start.date end.date  district     cases yearweek
   <dbl> <dbl> <chr>      <chr>     <chr>        <dbl>   <week>
 1  2025     3 1/4/2025   1/10/2025 Ampara           6 2025 W01
 2  2025     4 1/11/2025  1/17/2025 Ampara           7 2025 W02
 3  2025     5 1/18/2025  1/24/2025 Ampara           3 2025 W03
 4  2025     6 1/25/2025  1/31/2025 Ampara           4 2025 W04
 5  2025     7 2/1/2025   2/7/2025  Ampara           3 2025 W05
 6  2025     8 2/8/2025   2/14/2025 Ampara           3 2025 W06
 7  2025     9 2/15/2025  2/21/2025 Ampara           5 2025 W07
 8  2025    10 2/22/2025  2/28/2025 Ampara           4 2025 W08
 9  2025    11 3/1/2025   3/7/2025  Ampara           0 2025 W09
10  2025     3 1/4/2025   1/10/2025 Anuradhapura    20 2025 W01
# ℹ 224 more rows

Results

Benchmark: Without considering the interruption effect

Show the code
tb_all_ARIMA <- train_tsibble |> model(arima = ARIMA(cases))
# A mable: 26 x 2
# Key:     district [26]
   district                         arima
   <chr>                          <model>
 1 Ampara                  <ARIMA(5,1,0)>
 2 Anuradhapura            <ARIMA(0,1,1)>
 3 Badulla                 <ARIMA(2,1,1)>
 4 Batticaloa              <ARIMA(1,1,3)>
 5 Colombo      <ARIMA(0,1,2)(0,0,1)[52]>
 6 Galle        <ARIMA(0,1,3)(0,0,1)[52]>
 7 Gampaha                 <ARIMA(0,1,2)>
 8 Hambanthota  <ARIMA(0,1,1)(0,0,2)[52]>
 9 Jaffna                  <ARIMA(0,1,5)>
10 Kalmune                 <ARIMA(0,1,5)>
# ℹ 16 more rows

Approach 1: Excluding the interrupted period from model training

Show the code
train_tsibble_excludecovid <- train_tsibble |> filter(year(yearweek) < 2025 & year(yearweek) > 2022 )
train_tsibble_excludecovid |> head()
# A tsibble: 6 x 7 [1W]
# Key:       district [1]
   year  week start.date end.date  district cases yearweek
  <dbl> <dbl> <chr>      <chr>     <chr>    <dbl>   <week>
1  2023     2 1/7/2023   1/13/2023 Ampara       7 2023 W01
2  2023     3 1/14/2023  1/20/2023 Ampara       8 2023 W02
3  2023     4 1/21/2023  1/27/2023 Ampara       1 2023 W03
4  2023     5 1/28/2023  2/3/2023  Ampara       1 2023 W04
5  2023     6 2/4/2023   2/10/2023 Ampara       7 2023 W05
6  2023     7 2/11/2023  2/17/2023 Ampara       1 2023 W06
train_tsibble_excludecovid |> tail()
# A tsibble: 6 x 7 [1W]
# Key:       district [1]
   year  week start.date end.date   district cases yearweek
  <dbl> <dbl> <chr>      <chr>      <chr>    <dbl>   <week>
1  2024    49 11/23/2024 11/29/2024 Vavuniya     3 2024 W47
2  2024    50 11/30/2024 12/6/2024  Vavuniya     2 2024 W48
3  2024    51 12/7/2024  12/13/2024 Vavuniya     4 2024 W49
4  2024    52 12/14/2024 12/20/2024 Vavuniya     4 2024 W50
5  2025     1 12/21/2024 12/27/2024 Vavuniya     2 2024 W51
6  2025     2 12/28/2024 1/3/2025   Vavuniya     4 2024 W52

Approach 1: Fitted model for each district

Show the code
tb_all_ARIMA_excludecovid <- train_tsibble_excludecovid |> model(arima = ARIMA(cases))
# A mable: 26 x 2
# Key:     district [26]
   district                      arima
   <chr>                       <model>
 1 Ampara       <ARIMA(1,0,3) w/ mean>
 2 Anuradhapura <ARIMA(0,0,3) w/ mean>
 3 Badulla      <ARIMA(1,0,1) w/ mean>
 4 Batticaloa           <ARIMA(0,1,0)>
 5 Colombo      <ARIMA(3,0,1) w/ mean>
 6 Galle                <ARIMA(0,1,1)>
 7 Gampaha              <ARIMA(0,1,1)>
 8 Hambanthota          <ARIMA(0,1,1)>
 9 Jaffna               <ARIMA(1,0,2)>
10 Kalmune              <ARIMA(3,1,0)>
# ℹ 16 more rows

Approach 2: Forecasting the interrupted period first and then using it for modeling

Step 1: Forecasting interrupted period

train_tsibble2 <- train_tsibble |> filter(year(yearweek) < 2020)
test_tsibble2  <- train_tsibble |> filter(year(yearweek) == 2020 | 
       year(yearweek) == 2021 | 
       year(yearweek) == 2022)
tb_all_ARIMA2 <- train_tsibble2 |> model(arima = ARIMA(cases))
fc2 <-   tb_all_ARIMA2 |> 
  forecast(h=157) |>
  mutate(.mean = round(.mean, 0))

Approach 2: Forecasting the interrupted period first and then using it for modeling (cont.)

Step 2: Replace interrupted period with forecasts

Show the code
train_tsibble_updated <- train_tsibble |>
  left_join(fc2 , by = c("district", "yearweek")) |>
  mutate(
    cases = if_else(!is.na(.mean), .mean, cases.x)  # Replace only if .mean is available
  ) |>
  select(-c(.mean, cases.x, cases.y))
train_tsibble_updated 
# A tsibble: 24,466 x 8 [1W]
# Key:       district [26]
    year  week start.date end.date   district yearweek .model cases
   <dbl> <dbl> <chr>      <chr>      <chr>      <week> <chr>  <dbl>
 1  2006    52 12/23/2006 12/29/2006 Ampara   2006 W51 <NA>       0
 2  2007     1 12/30/2006 1/5/2007   Ampara   2006 W52 <NA>       0
 3  2007     2 1/6/2007   1/12/2007  Ampara   2007 W01 <NA>       0
 4  2007     3 1/13/2007  1/19/2007  Ampara   2007 W02 <NA>       0
 5  2007     4 1/20/2007  1/26/2007  Ampara   2007 W03 <NA>       0
 6  2007     5 1/27/2007  2/2/2007   Ampara   2007 W04 <NA>       0
 7  2007     6 2/3/2007   2/9/2007   Ampara   2007 W05 <NA>       0
 8  2007     7 2/10/2007  2/16/2007  Ampara   2007 W06 <NA>       0
 9  2007     8 2/17/2007  2/23/2007  Ampara   2007 W07 <NA>       0
10  2007     9 2/24/2007  3/2/2007   Ampara   2007 W08 <NA>       1
# ℹ 24,456 more rows

Approach 2: Forecasting the interrupted period first and then using it for modeling (cont.)

Step 3: Use updated training set for forecasting

tb_all_ARIMA_updatedcovid <- train_tsibble_updated |> model(arima = ARIMA(cases))
tb_all_ARIMA_updatedcovid 
# A mable: 26 x 2
# Key:     district [26]
   district                         arima
   <chr>                          <model>
 1 Ampara       <ARIMA(0,1,2)(0,0,2)[52]>
 2 Anuradhapura            <ARIMA(3,1,3)>
 3 Badulla                 <ARIMA(1,1,4)>
 4 Batticaloa   <ARIMA(0,1,1)(0,0,1)[52]>
 5 Colombo      <ARIMA(0,1,3)(0,0,1)[52]>
 6 Galle        <ARIMA(0,1,3)(0,0,1)[52]>
 7 Gampaha                 <ARIMA(0,1,2)>
 8 Hambanthota  <ARIMA(0,1,1)(0,0,1)[52]>
 9 Jaffna                  <ARIMA(1,1,2)>
10 Kalmune                 <ARIMA(0,1,0)>
# ℹ 16 more rows

Forecasts

Benchmark: Without considering the interruption effect

fcb <-   tb_all_ARIMA |> 
  forecast(h=9) |>
  mutate(.mean = round(.mean, 0))

Approach 1: Excluding the interrupted period from model training

fc_a1 <-   tb_all_ARIMA_excludecovid |> 
  forecast(h=9) |>
  mutate(.mean = round(.mean, 0))

Approach 2: Forecasting the interrupted period first and then using it for modeling

fc_a2 <-   tb_all_ARIMA_updatedcovid  |> 
  forecast(h=9) |>
  mutate(.mean = round(.mean, 0))

Approach 3: Down-weighting interrupted period

\[forecast_{Approach 3} = 0.7 \times forecast_{\text{excluded}} + 0.3 \times forecast_{\text{included without extimating}}\]

#fc_a3 = (0.3*fcb$.mean) + (0.7*fc_a1$.mean)
fc_a3 <- fc_a1 |>
  mutate(.mean = 0.3 * fcb$.mean + 0.7 * .mean,
         .model = "fc_a3")
fc_a3
# A fable: 234 x 5 [1W]
# Key:     district, .model [26]
   district     .model yearweek
   <chr>        <chr>    <week>
 1 Ampara       fc_a3  2025 W01
 2 Ampara       fc_a3  2025 W02
 3 Ampara       fc_a3  2025 W03
 4 Ampara       fc_a3  2025 W04
 5 Ampara       fc_a3  2025 W05
 6 Ampara       fc_a3  2025 W06
 7 Ampara       fc_a3  2025 W07
 8 Ampara       fc_a3  2025 W08
 9 Ampara       fc_a3  2025 W09
10 Anuradhapura fc_a3  2025 W01
# ℹ 224 more rows
# ℹ 2 more variables: cases <dist>, .mean <dbl>

Model Comparison

fcb_accuracy <- fabletools::accuracy(fcb, test_tsibble)
fc_a1_accuracy <- fabletools::accuracy(fc_a1, test_tsibble)
fcb_a2_accuracy <- fabletools::accuracy(fc_a2, test_tsibble)
fcb_a3_accuracy <- fabletools::accuracy(fc_a3, test_tsibble)

Output only for fcb_accuracy for illustration

fcb_accuracy
# A tibble: 26 × 11
   .model district   .type      ME  RMSE   MAE     MPE  MAPE  MASE RMSSE    ACF1
   <chr>  <chr>      <chr>   <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
 1 arima  Ampara     Test    0.679  2.08  1.61 -Inf    Inf     NaN   NaN  0.0940
 2 arima  Anuradhap… Test    7.85  11.2   8.28   24.1   26.8   NaN   NaN  0.190 
 3 arima  Badulla    Test    8.14  11.6   9.66   16.9   63.2   NaN   NaN  0.552 
 4 arima  Batticaloa Test   -4.67   9.13  8.14   -9.06  13.5   NaN   NaN -0.160 
 5 arima  Colombo    Test   38.4   43.3  38.4    15.2   15.2   NaN   NaN -0.294 
 6 arima  Galle      Test   20.6   25.4  21.8    37.0   42.7   NaN   NaN  0.122 
 7 arima  Gampaha    Test  -62.5   68.1  62.5   -45.4   45.4   NaN   NaN  0.358 
 8 arima  Hambantho… Test   -6.62  10.3   9.34  -55.2   63.3   NaN   NaN  0.636 
 9 arima  Jaffna     Test  -24.1   26.3  24.1   -88.5   88.5   NaN   NaN  0.146 
10 arima  Kalmune    Test   -6.53   8.42  6.78  -58.4   60.0   NaN   NaN  0.138 
# ℹ 16 more rows